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We  prove  that  th?  rate  of  convergence  of  quasi-Newton  methods  is  the  golden 


section  ratio  (1  +vp)/2. 
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1.  Introduction 


Newton's  method  for  the  minimization  of  f:  Rn  >  R  requires  computation  and 
inversion  of  the  Hessian  matrix  at  each  iteration.  Quasi-Newton  methods  approximate 
the  Hessian  or  its  inverse  by  first  order  (i.e.  gradient)  information.  These  methods 
extend  the  classical  secant  (or  False  Position)  method  for  n  >  1  (see  e.g.  Luen- 
berger  [7]).  They  are  known  to  converge  to  the  solution  superlinearly  (see  Dennis 
and  More  [3]  and  the  references  there).  Thus,  it  is  commonly  accepted  (e.g.  [3]), 
that  the  price  paid  for  the  approximation  of  the  Hessian  by  gradient  information  is 
a  reduction  from  second  order  to  superl inear  convergence. 

In  [1,2],  we  developed  new  tools  for  the  analysis  of  the  rate  of  convergence  of 
interpolatory  algorithms.  We  use  them  in  this  paper  to  prove  that  actually,  the  rate 
of  convergence  of  a  class  of  quasi-Newton  methods,  without  line-search  and  without 
restart,  is  given  by  the  golden  section  ratio  (1  +  '/5)/2  ~  1.618.  We  note  in  passing 
that  no  other  tools  exist  enabling  one  to  establish  convergence  rates  between  super- 
linear  and  quadratic. 


2.  Rate  of  Convergence  Analysis 

Newton's  method  consists  of  the  iteration  x^+^ 

2 

Vf  ,  V  f  are  the  gradient  and  Hessian  of  f  respectively  and  all  vectors  are  column 
vectors.  Quasi-Newton  replace  this  equation  with 


\  -  [V2f (xfc)]"  •Vf(xfe).  Here 


(1) 


Vi  ^'"iW  • 


where  the  matrix  H^  approximates  the  inverse  of  the  Hessian,  and  the  step-size 
e  R  is  obtained  by  an  exact  or  approximate  line-search.  The  matrix  is  re¬ 

quired  to  satisfy 


2. 


<2>  Wk  -  =k  - 

where 

(3)  yk  =  vf(W  *vf<V  *  sk =  Vi  m\  • 

For  a  thorough  discussion  of  these  methods  see  Dennis  and  More  [3]. 
Henceforth,  we  will  assume  = 1  for  all  k,  i.e.,  no  line  search  is  per¬ 
formed  so  that  the  Iteration  formula  becomes 


<4)  • 

In  the  one  dimensional  case  (n  =  l),  equation  (2)  Implies 

u  .  Wk 

'  ei-*U 

with  f^  =  f '  (x^) ,  so  that  (A)  is  the  classical  secant  or  False  Position  method 
(see  Luenberger  [7]).  For  this  reason  equation  (2)  is  called  the  secant  equation. 
Other  names,  e.g.  quasi-Newton  equation,  are  also  in  use.  This  equation  plays  a 
fundamental  role  in  the  classical  theory  of  quasi-Newton  methods  as  well  as  in  our 
analysis . 

The  formulas  expressing  in  terms  of  and  the  data  are  called  up¬ 

dating  formulas.  Different  updating  formulas  give  rise  to  a  variety  of  quasi- 
Newton  methods.  In  addition,  there  are  quasi-Newton  methods  which  replace  equations 
(2)  and  (A)  with 


(5) 

(6) 


-1, 


\+i  ■  \-Bk  ,f<V  - 


Bk+A  *  yk 


together  with  an  appropriate  updating  formula  for  the  matrix  . 


3. 


We  recall  our  basic  results  on  hyperosculatory  interpolation  algorithms  de¬ 
veloped  in  [1,2].  The  Interpolation  algorithm  studied  there  generates  a  sequence 
{x^}  as  follows.  Let  s  >  1,  m  >  0  be  fixed  integers,  and  let  T:  Rn  R  de¬ 
pend  on  r  =  s(m  +  l)  parameters.  Given  m  +  1  approximants  xp  »  •  *  * »  xm | ^  to  the 

it 

solution  x  of  Vf  (x)  =  0,  we  use  »  •  •  •»  *^..1  •  \  to  construct  a  new  approxi- 

mant  .  First  we  interpolate  f  by  T  requiring 


(7) 


T^(x^  ,)  =  f^(Xj.)  J  =  0 . .  1  =  0 . s-1  . 


y  ‘  v"k-r 

Here  f(1)  =  Vf,  f<2)  etc.  The  new  point  x^+^  is  determined  by 


(8) 


7T<W  =  ° 


In  [1],  we  proved  that  the  sequence  (x^) ,  generated  by  this  algorithm  converges 

(locally)  to  the  solution  with  Q-  and  R-rates  of  convergence  at  least  p,  where 

m+1  m  m_1  < 

p  is  the  unique  positive  solution  of  the  equation  t  -  (s-l)t  -s  Z  t  -  0  (the 

j=0 

sum  is  taken  as  zero  if  m  *  0) .  For  the  definitions  of  the  Q-  and  R-rates  of  con¬ 
vergence  and  their  properties  see  [9,  f 9] .  The  derivation  of  this  result  is  based 
on  the  analysis  in  Traub  [11],  where  a  difference  relation  for  the  errors  Hx^-x  || 
is  used  to  compute  the  rate. 

f 

To  show  that  quasi-Newton  methods  as  defined  above  can  be  regarded  as  interpo- 
latory  algorithms,  we  now  characterize  them  by  the  requirements 


(9) 

T<V  • 

f<V 

(10) 

VT  (j^)  - 

vf(V 

(ID 

1 

H 

• 

o 

*f<Vi> 

and 

(12) 

-  0  , 

4. 


where  T  is  the  quadratic  interpolation  function 

(13)  T(x)  =  ffr^)  +  (x-xk)T7f(xk)  +  j(x-xk)TBk(x-xk)  , 

T 

and  where  is  a  symmetric  nonsingular  nxn  matrix,  and  a  stands  for  the 

transpose  of  the  vector  a. 

Indeed,  if  T  is  defined  by  (13),  equation  (9)  holds  and 

(14)  VT(x)  =  7f(xk)  +Bk(x-xk)  , 

which  Implies  (10).  Using  (14)  in  (12)  we  have  7f(xk)+B 
equivalent  to  (5).  Finally  the  requirement  (11)  is  equivalent  to 


k(xk+r\)  =  0*  whlch  18 


vf(xk)  +  VvrV  =  7f<W 


which  is  the  secant  equation  (6). 

So  far  we  have  interpreted  all  quasi -Newton  algorithms  as  interpolatory  algo¬ 
rithms.  Note  that  (9) -(11)  do  not  define  hyperosculatory  interpolation,  since  we 
do  not  require  T(xk_^) *  f (xk_^),  therefore  our  results  in  [1]  do  not  apply  di¬ 
rectly  to  the  algorithm  (9)-(12).  For  n=l  the  algorithm  is  precisely  the  secant 
method  which  is  well  known  to  have  convergence  order  (1  +Vs)/2.  We  will  now  show 
that  the  rate  of  convergence  of  a  class  of  quasi-Newton  methods  is  induced  by  the 
underlying  one-dimensional  secant  algorithm. 

First  we  note  that  equation  (9)  is  redundant.  Indeed,  equations  (10) -(13) 
are  sufficient  to  define  the  sequence  for  If  T(x)  satisfies  (9) — (13) 

and  T^(x)*T(x)+a  with  a  e  R,  equation  (9)  may  no  longer  hold  for  T^x),  but 
9T^(x)«VT(x)  will  produce  the  same  value  for  Xj^  • 

As  in  [1],  we  derive  the  basic  difference  equation  we  need  by  passing  a  curve 

tl  * 

in  R  through  the  points  ^  •  ’'k+l '  x  '  l*e*»  we  determine  a  function 

y:  R  *  Rn  such  that 


where  the  parameter  t  is  chosen  so  that 


dfi)  »  c*  =  !lx*“x*!l  =  o  • 

This  can  evidently  be  done  in  infinitely  many  ways.  We  will  later  specify  further 
restrictions  on  \|r.  Defining  6(t)  =  T(iKt>),  <p(t)  =  f (i|f(t))  and  0(t)  =  ©'(t), 

»p(t)  =  <p' (t),  we  have  from  (10)-(12) 


(17) 

(18) 

*  ",<tk-i) 

(19) 

■  0 

(20) 

■6 

o 

II 

o 

6. 


(23) 


«p(t)-o(t)  =  <P-2> &  ^^q(-2)  (t-t^)  (t-tk-1) 


^  .  By  (19)  we  have 

-0(0)  ®  0(t^+1)  -  0(0)  =  t^+10' (C)  with  £  between  t^+^  and  0.  Setting  t  =  0 
In  (23)  and  denoting  | =  £ (0)  we  therefore  have 


with  5 (t)  in  the  interval  spanned  by  t  ,  t^  and  t^ 


t  e.  (t)  =  v(2} 

k+1  ^  2 


tlc  Ck- 


which  completes  the  proof. 


□ 


Our  main  result  now  follows  from  equation  (21) . 

(3)  *  2  * 

Theorem  2.  Let  f  e  C  in  a  neighborhood  of  the  solution  x  .  If  V  f(x  )  is 

positive  definite,  and  if  the  sequence  (B^)  is  bounded,  then  there  exists  a 

•Uf 

neighborhood  N  of  x  ,  such  that  for  all  x^ ,  x^  e  N,  the  sequence  {x^}  gen- 

* 

erated  by  the  quasi-Newton  algorithm  converges  to  x  with  Q-  and  R-rates  of  con¬ 
vergence  at  least  (1  +*/ 5)/2. 


Proof.  This  is  an  immediate  consequence  of  the  difference  equation  (21),  if  the 
sequence  (A^)  is  bounded  (see  e.g.  [6]  or  [11]  and  [2]). 

Under  the  assumptions  of  the  theorem  and  by  definition  of  the  functions  0  ,  <p, 
it  is  therefore  sufficient  to  show  that  the  curve  \(r  can  be  chosen  so  that  the 

■x 

derivatives  of  are  bounded  at  t  =  0,  and  q>'  (0)  4  0. 

Note  that  \|r  is  used  to  derive  equation  (21),  but  its  construction  is  not  a 

S2f  (x*> 

part  of  the  algorithm.  Assuming  without  loss  of  generality  1  2  ^  an<*  8*nce 

<p'(0)  »*(0)T  2f  (x*)t(0),  one  can  satisfy  (15)  and  9'  (0)  4  0  by  choosing 
r  , 

*  (t)  -  E  a  tJ  (i«l,...,n)  with  a.  -1,  a.  -  0  i*2,...,n.  This  completes 

1  j-0  1 

the  proof. 


a 


7. 


Theorem  2  holds  for  all  quasi-Newton  methods.  We  now  turn  our  attention  to 
the  so-called  Broyden's  class  of  quasi-Newton  methods,  which  are  defined  by  the 
updating  formula 

h  -a +!*!*.  WA  t 
\+i  •  \  +^r  t:  +  Yk\  * 

Vk  yk\yk 


(24) 


with  y^  ,  s^  defined  by  (3), 


\*(ykVk) 


and  e  [0, 1] 


%  W 

Vk  *£Vk 


Evidently  boundedness  of  and  i6  equivalent 


Theorem  3.  Let  f  e  C 


(3) 


in  a  neighborhood  of  the  solution  x  t  and  let  V  f(x  ) 


he  positive  definite.  If  x^ ,  x^  are  close  enough  to  x  ,  if  HQ  is  symmetric 

* 

and  positive  definite,  and  if  the  matrices  are  updated  by  (24),  then  x^  x 

with  Q-  and  R-rates  of  convergence  at  least  (1  +>/"?)/ 2. 


2  —  — 

Proof.  By  the  mean  value  theorem  we  have  y^  =  where  A^  =  V  f(x)  and  x  on 

the  segment  line  connecting  x^  and  x^+^  .  Fletcher  [4]  proved  that  the  eigenvalues 

%  %  2 
of  A^H^A^  are  bounded.  Since  we  assumed  that  V  f  is  continuous  and  positive 

definite  at  x  ,  the  eigenvalues  of  are  bounded  and  the  result  follows  from 

Theorem  2. 


3.  Concluding  Remarks 

Under  traditional  assumptions,  we  have  proved  that  quasi-Newton  methods  inherit 
their  rate  of  convergence  from  the  underlying  secant  method  (cf.  Luenberger  [6,  §7.2]. 


8. 


Thus,  the  assumption  in  Theorem  8.9  of  [3]  that  equation  (8.21)  of  that  paper  holds, 
is  not  made  here.  Similarly,  no  assumption  has  been  made  on  the  linear  independence 
of  the  directions  (s^)  (cf.  More/  and  Trangenstein  [8]). 

We  have  not  broadened  our  analysis  to  quasi-Newton  methods  beyond  those  be¬ 
longing  to  Broyden's  class  of  updates  (and  their  inverse  updated  in  the  sense  of  [3]), 
in  order  not  to  obscure  the  main  points  in  our  analysis.  The  well  known  Davidon- 
Fletcher-Powell  and  Broyden-Fletcher  Goldfarb-Shanno  algorithms  fall  in  this  category. 
While  the  latter  algorithm  is  the  best  available  at  present,  our  analysis  in  [1] 
suggests  that  faster  algorithms  can  be  designed  utilizing  gradient  information  only. 

Our  results  extend  with  the  obvious  modifications  for  the  problem  of  solving 
F  (x)  =  0,  F:  Rn  >  Rn  discussed  in  the  first  part  of  [3].  They  also  extend  to  the 
infinite  dimensional  case  if  the  coefficients  in  the  basic  difference  equation 

(21)  are  bounded. 

From  our  point  of  view,  the  rate  of  convergence  of  quasi-Newton  methods  has 
nothing  to  do  with  their  so-called  quadratic  termination  property.  It  is  a  conse¬ 
quence  of  the  data  used  in  the  interpolatory  equations  (7)  (see  [1,2]).  Therefore, 
the  Huang  class  of  updates  [5]  is  too  wide  in  the  sense  that  it  contains  updates 
which  do  not  satisfy  the  secant  equation.  Note  also  that  Theorem  8.10  of  [3]  is 
not  interesting  in  the  sense  that  1.6n  >  2  for  all  n  >  1. 

Finally,  note  that  the  common  observation  that  Newton's  method  is  self 
corrective  in  the  sense  that  depends  explicitly  on  x^  only,  while  quasi- 

Newton  methods  carry  along  bad  effects  from  previous  iterations,  is  not  justified. 

The  fact  that  quasi-Newton  methods  are  two-point  Interpolatory  algorithms,  is 
exactly  their  advantage  over  Newton's  method  (see  [10,  $6.4],  [1]  and  [2]). 


-A  - 
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